home *** CD-ROM | disk | FTP | other *** search
Text File | 1994-06-09 | 89.1 KB | 1,703 lines |
- <NIC.MERIT.EDU> /introducing.the.internet/intro.to.ip
-
-
-
-
-
-
-
-
-
- Introduction
- to
- the Internet Protocols
-
-
-
-
-
- C R
-
- C S
- Computer Science Facilities Group
- C I
-
- L S
-
-
- RUTGERS
- The State University of New Jersey
-
-
-
-
- 3 July 1987
-
- This is an introduction to the Internet networking protocols (TCP/IP).
- It includes a summary of the facilities available and brief
- descriptions of the major protocols in the family.
-
- Copyright (C) 1987, Charles L. Hedrick. Anyone may reproduce this
- document, in whole or in part, provided that: (1) any copy or
- republication of the entire document must show Rutgers University as
- the source, and must include this notice; and (2) any other use of
- this material must reference this manual and Rutgers University, and
- the fact that the material is copyright by Charles Hedrick and is used
- by permission.
-
-
-
- Unix is a trademark of AT&T Technologies, Inc.
-
-
-
- Table of Contents
-
-
- 1. What is TCP/IP? 1
- 2. General description of the TCP/IP protocols 5
- 2.1 The TCP level 7
- 2.2 The IP level 10
- 2.3 The Ethernet level 11
- 3. Well-known sockets and the applications layer 12
- 3.1 An example application: SMTP 15
- 4. Protocols other than TCP: UDP and ICMP 17
- 5. Keeping track of names and information: the domain system 18
- 6. Routing 20
- 7. Details about Internet addresses: subnets and broadcasting 21
- 8. Datagram fragmentation and reassembly 23
- 9. Ethernet encapsulation: ARP 24
- 10. Getting more information 25
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- i
-
-
-
- This document is a brief introduction to TCP/IP, followed by advice on
- what to read for more information. This is not intended to be a
- complete description. It can give you a reasonable idea of the
- capabilities of the protocols. But if you need to know any details of
- the technology, you will want to read the standards yourself.
- Throughout the text, you will find references to the standards, in the
- form of "RFC" or "IEN" numbers. These are document numbers. The final
- section of this document tells you how to get copies of those
- standards.
-
-
-
- 1. What is TCP/IP?
-
-
- TCP/IP is a set of protocols developed to allow cooperating computers
- to share resources across a network. It was developed by a community
- of researchers centered around the ARPAnet. Certainly the ARPAnet is
- the best-known TCP/IP network. However as of June, 87, at least 130
- different vendors had products that support TCP/IP, and thousands of
- networks of all kinds use it.
-
- First some basic definitions. The most accurate name for the set of
- protocols we are describing is the "Internet protocol suite". TCP and
- IP are two of the protocols in this suite. (They will be described
- below.) Because TCP and IP are the best known of the protocols, it
- has become common to use the term TCP/IP or IP/TCP to refer to the
- whole family. It is probably not worth fighting this habit. However
- this can lead to some oddities. For example, I find myself talking
- about NFS as being based on TCP/IP, even though it doesn't use TCP at
- all. (It does use IP. But it uses an alternative protocol, UDP,
- instead of TCP. All of this alphabet soup will be unscrambled in the
- following pages.)
-
- The Internet is a collection of networks, including the Arpanet,
- NSFnet, regional networks such as NYsernet, local networks at a number
- of University and research institutions, and a number of military
- networks. The term "Internet" applies to this entire set of networks.
- The subset of them that is managed by the Department of Defense is
- referred to as the "DDN" (Defense Data Network). This includes some
- research-oriented networks, such as the Arpanet, as well as more
- strictly military ones. (Because much of the funding for Internet
- protocol developments is done via the DDN organization, the terms
- Internet and DDN can sometimes seem equivalent.) All of these
- networks are connected to each other. Users can send messages from
- any of them to any other, except where there are security or other
- policy restrictions on access. Officially speaking, the Internet
- protocol documents are simply standards adopted by the Internet
- community for its own use. More recently, the Department of Defense
- issued a MILSPEC definition of TCP/IP. This was intended to be a more
- formal definition, appropriate for use in purchasing specifications.
- However most of the TCP/IP community continues to use the Internet
- standards. The MILSPEC version is intended to be consistent with it.
-
- Whatever it is called, TCP/IP is a family of protocols. A few provide
- 1
-
-
-
- "low-level" functions needed for many applications. These include IP,
- TCP, and UDP. (These will be described in a bit more detail later.)
- Others are protocols for doing specific tasks, e.g. transferring files
- between computers, sending mail, or finding out who is logged in on
- another computer. Initially TCP/IP was used mostly between
- minicomputers or mainframes. These machines had their own disks, and
- generally were self-contained. Thus the most important "traditional"
- TCP/IP services are:
-
- - file transfer. The file transfer protocol (FTP) allows a user on
- any computer to get files from another computer, or to send files
- to another computer. Security is handled by requiring the user
- to specify a user name and password for the other computer.
- Provisions are made for handling file transfer between machines
- with different character set, end of line conventions, etc. This
- is not quite the same thing as more recent "network file system"
- or "netbios" protocols, which will be described below. Rather,
- FTP is a utility that you run any time you want to access a file
- on another system. You use it to copy the file to your own
- system. You then work with the local copy. (See RFC 959 for
- specifications for FTP.)
-
- - remote login. The network terminal protocol (TELNET) allows a
- user to log in on any other computer on the network. You start a
- remote session by specifying a computer to connect to. From that
- time until you finish the session, anything you type is sent to
- the other computer. Note that you are really still talking to
- your own computer. But the telnet program effectively makes your
- computer invisible while it is running. Every character you type
- is sent directly to the other system. Generally, the connection
- to the remote computer behaves much like a dialup connection.
- That is, the remote system will ask you to log in and give a
- password, in whatever manner it would normally ask a user who had
- just dialed it up. When you log off of the other computer, the
- telnet program exits, and you will find yourself talking to your
- own computer. Microcomputer implementations of telnet generally
- include a terminal emulator for some common type of terminal.
- (See RFC's 854 and 855 for specifications for telnet. By the
- way, the telnet protocol should not be confused with Telenet, a
- vendor of commercial network services.)
-
- - computer mail. This allows you to send messages to users on
- other computers. Originally, people tended to use only one or
- two specific computers. They would maintain "mail files" on
- those machines. The computer mail system is simply a way for you
- to add a message to another user's mail file. There are some
- problems with this in an environment where microcomputers are
- used. The most serious is that a micro is not well suited to
- receive computer mail. When you send mail, the mail software
- expects to be able to open a connection to the addressee's
- computer, in order to send the mail. If this is a microcomputer,
- it may be turned off, or it may be running an application other
- than the mail system. For this reason, mail is normally handled
- by a larger system, where it is practical to have a mail server
- running all the time. Microcomputer mail software then becomes a
- 2
-
-
-
- user interface that retrieves mail from the mail server. (See
- RFC 821 and 822 for specifications for computer mail. See RFC
- 937 for a protocol designed for microcomputers to use in reading
- mail from a mail server.)
-
- These services should be present in any implementation of TCP/IP,
- except that micro-oriented implementations may not support computer
- mail. These traditional applications still play a very important role
- in TCP/IP-based networks. However more recently, the way in which
- networks are used has been changing. The older model of a number of
- large, self-sufficient computers is beginning to change. Now many
- installations have several kinds of computers, including
- microcomputers, workstations, minicomputers, and mainframes. These
- computers are likely to be configured to perform specialized tasks.
- Although people are still likely to work with one specific computer,
- that computer will call on other systems on the net for specialized
- services. This has led to the "server/client" model of network
- services. A server is a system that provides a specific service for
- the rest of the network. A client is another system that uses that
- service. (Note that the server and client need not be on different
- computers. They could be different programs running on the same
- computer.) Here are the kinds of servers typically present in a
- modern computer setup. Note that these computer services can all be
- provided within the framework of TCP/IP.
-
- - network file systems. This allows a system to access files on
- another computer in a somewhat more closely integrated fashion
- than FTP. A network file system provides the illusion that disks
- or other devices from one system are directly connected to other
- systems. There is no need to use a special network utility to
- access a file on another system. Your computer simply thinks it
- has some extra disk drives. These extra "virtual" drives refer
- to the other system's disks. This capability is useful for
- several different purposes. It lets you put large disks on a few
- computers, but still give others access to the disk space. Aside
- from the obvious economic benefits, this allows people working on
- several computers to share common files. It makes system
- maintenance and backup easier, because you don't have to worry
- about updating and backing up copies on lots of different
- machines. A number of vendors now offer high-performance
- diskless computers. These computers have no disk drives at all.
- They are entirely dependent upon disks attached to common "file
- servers". (See RFC's 1001 and 1002 for a description of
- PC-oriented NetBIOS over TCP. In the workstation and
- minicomputer area, Sun's Network File System is more likely to be
- used. Protocol specifications for it are available from Sun
- Microsystems.)
-
- - remote printing. This allows you to access printers on other
- computers as if they were directly attached to yours. (The most
- commonly used protocol is the remote lineprinter protocol from
- Berkeley Unix. Unfortunately, there is no protocol document for
- this. However the C code is easily obtained from Berkeley, so
- implementations are common.)
-
- 3
-
-
-
- - remote execution. This allows you to request that a particular
- program be run on a different computer. This is useful when you
- can do most of your work on a small computer, but a few tasks
- require the resources of a larger system. There are a number of
- different kinds of remote execution. Some operate on a command
- by command basis. That is, you request that a specific command
- or set of commands should run on some specific computer. (More
- sophisticated versions will choose a system that happens to be
- free.) However there are also "remote procedure call" systems
- that allow a program to call a subroutine that will run on
- another computer. (There are many protocols of this sort.
- Berkeley Unix contains two servers to execute commands remotely:
- rsh and rexec. The man pages describe the protocols that they
- use. The user-contributed software with Berkeley 4.3 contains a
- "distributed shell" that will distribute tasks among a set of
- systems, depending upon load. Remote procedure call mechanisms
- have been a topic for research for a number of years, so many
- organizations have implementations of such facilities. The most
- widespread commercially-supported remote procedure call protocols
- seem to be Xerox's Courier and Sun's RPC. Protocol documents are
- available from Xerox and Sun. There is a public implementation
- of Courier over TCP as part of the user-contributed software with
- Berkeley 4.3. An implementation of RPC was posted to Usenet by
- Sun, and also appears as part of the user-contributed software
- with Berkeley 4.3.)
-
- - name servers. In large installations, there are a number of
- different collections of names that have to be managed. This
- includes users and their passwords, names and network addresses
- for computers, and accounts. It becomes very tedious to keep
- this data up to date on all of the computers. Thus the databases
- are kept on a small number of systems. Other systems access the
- data over the network. (RFC 822 and 823 describe the name server
- protocol used to keep track of host names and Internet addresses
- on the Internet. This is now a required part of any TCP/IP
- implementation. IEN 116 describes an older name server protocol
- that is used by a few terminal servers and other products to look
- up host names. Sun's Yellow Pages system is designed as a
- general mechanism to handle user names, file sharing groups, and
- other databases commonly used by Unix systems. It is widely
- available commercially. Its protocol definition is available
- from Sun.)
-
- - terminal servers. Many installations no longer connect terminals
- directly to computers. Instead they connect them to terminal
- servers. A terminal server is simply a small computer that only
- knows how to run telnet (or some other protocol to do remote
- login). If your terminal is connected to one of these, you
- simply type the name of a computer, and you are connected to it.
- Generally it is possible to have active connections to more than
- one computer at the same time. The terminal server will have
- provisions to switch between connections rapidly, and to notify
- you when output is waiting for another connection. (Terminal
- servers use the telnet protocol, already mentioned. However any
- real terminal server will also have to support name service and a
- 4
-
-
-
- number of other protocols.)
-
- - network-oriented window systems. Until recently, high-
- performance graphics programs had to execute on a computer that
- had a bit-mapped graphics screen directly attached to it.
- Network window systems allow a program to use a display on a
- different computer. Full-scale network window systems provide an
- interface that lets you distribute jobs to the systems that are
- best suited to handle them, but still give you a single
- graphically-based user interface. (The most widely-implemented
- window system is X. A protocol description is available from
- MIT's Project Athena. A reference implementation is publically
- available from MIT. A number of vendors are also supporting
- NeWS, a window system defined by Sun. Both of these systems are
- designed to use TCP/IP.)
-
- Note that some of the protocols described above were designed by
- Berkeley, Sun, or other organizations. Thus they are not officially
- part of the Internet protocol suite. However they are implemented
- using TCP/IP, just as normal TCP/IP application protocols are. Since
- the protocol definitions are not considered proprietary, and since
- commercially-support implementations are widely available, it is
- reasonable to think of these protocols as being effectively part of
- the Internet suite. Note that the list above is simply a sample of
- the sort of services available through TCP/IP. However it does
- contain the majority of the "major" applications. The other
- commonly-used protocols tend to be specialized facilities for getting
- information of various kinds, such as who is logged in, the time of
- day, etc. However if you need a facility that is not listed here, we
- encourage you to look through the current edition of Internet
- Protocols (currently RFC 1011), which lists all of the available
- protocols, and also to look at some of the major TCP/IP
- implementations to see what various vendors have added.
-
-
-
- 2. General description of the TCP/IP protocols
-
-
- TCP/IP is a layered set of protocols. In order to understand what
- this means, it is useful to look at an example. A typical situation
- is sending mail. First, there is a protocol for mail. This defines a
- set of commands which one machine sends to another, e.g. commands to
- specify who the sender of the message is, who it is being sent to, and
- then the text of the message. However this protocol assumes that
- there is a way to communicate reliably between the two computers.
- Mail, like other application protocols, simply defines a set of
- commands and messages to be sent. It is designed to be used together
- with TCP and IP. TCP is responsible for making sure that the commands
- get through to the other end. It keeps track of what is sent, and
- retransmitts anything that did not get through. If any message is too
- large for one datagram, e.g. the text of the mail, TCP will split it
- up into several datagrams, and make sure that they all arrive
- correctly. Since these functions are needed for many applications,
- they are put together into a separate protocol, rather than being part
- 5
-
-
-
- of the specifications for sending mail. You can think of TCP as
- forming a library of routines that applications can use when they need
- reliable network communications with another computer. Similarly, TCP
- calls on the services of IP. Although the services that TCP supplies
- are needed by many applications, there are still some kinds of
- applications that don't need them. However there are some services
- that every application needs. So these services are put together into
- IP. As with TCP, you can think of IP as a library of routines that
- TCP calls on, but which is also available to applications that don't
- use TCP. This strategy of building several levels of protocol is
- called "layering". We think of the applications programs such as
- mail, TCP, and IP, as being separate "layers", each of which calls on
- the services of the layer below it. Generally, TCP/IP applications
- use 4 layers:
-
- - an application protocol such as mail
-
- - a protocol such as TCP that provides services need by many
- applications
-
- - IP, which provides the basic service of getting datagrams to
- their destination
-
- - the protocols needed to manage a specific physical medium, such
- as Ethernet or a point to point line.
-
- TCP/IP is based on the "catenet model". (This is described in more
- detail in IEN 48.) This model assumes that there are a large number
- of independent networks connected together by gateways. The user
- should be able to access computers or other resources on any of these
- networks. Datagrams will often pass through a dozen different
- networks before getting to their final destination. The routing
- needed to accomplish this should be completely invisible to the user.
- As far as the user is concerned, all he needs to know in order to
- access another system is an "Internet address". This is an address
- that looks like 128.6.4.194. It is actually a 32-bit number. However
- it is normally written as 4 decimal numbers, each representing 8 bits
- of the address. (The term "octet" is used by Internet documentation
- for such 8-bit chunks. The term "byte" is not used, because TCP/IP is
- supported by some computers that have byte sizes other than 8 bits.)
- Generally the structure of the address gives you some information
- about how to get to the system. For example, 128.6 is a network
- number assigned by a central authority to Rutgers University. Rutgers
- uses the next octet to indicate which of the campus Ethernets is
- involved. 128.6.4 happens to be an Ethernet used by the Computer
- Science Department. The last octet allows for up to 254 systems on
- each Ethernet. (It is 254 because 0 and 255 are not allowed, for
- reasons that will be discussed later.) Note that 128.6.4.194 and
- 128.6.5.194 would be different systems. The structure of an Internet
- address is described in a bit more detail later.
-
- Of course we normally refer to systems by name, rather than by
- Internet address. When we specify a name, the network software looks
- it up in a database, and comes up with the corresponding Internet
- address. Most of the network software deals strictly in terms of the
- 6
-
-
-
- address. (RFC 882 describes the name server technology used to handle
- this lookup.)
-
- TCP/IP is built on "connectionless" technology. Information is
- transfered as a sequence of "datagrams". A datagram is a collection
- of data that is sent as a single message. Each of these datagrams is
- sent through the network individually. There are provisions to open
- connections (i.e. to start a conversation that will continue for some
- time). However at some level, information from those connections is
- broken up into datagrams, and those datagrams are treated by the
- network as completely separate. For example, suppose you want to
- transfer a 15000 octet file. Most networks can't handle a 15000 octet
- datagram. So the protocols will break this up into something like 30
- 500-octet datagrams. Each of these datagrams will be sent to the
- other end. At that point, they will be put back together into the
- 15000-octet file. However while those datagrams are in transit, the
- network doesn't know that there is any connection between them. It is
- perfectly possible that datagram 14 will actually arrive before
- datagram 13. It is also possible that somewhere in the network, an
- error will occur, and some datagram won't get through at all. In that
- case, that datagram has to be sent again.
-
- Note by the way that the terms "datagram" and "packet" often seem to
- be nearly interchangable. Technically, datagram is the right word to
- use when describing TCP/IP. A datagram is a unit of data, which is
- what the protocols deal with. A packet is a physical thing, appearing
- on an Ethernet or some wire. In most cases a packet simply contains a
- datagram, so there is very little difference. However they can
- differ. When TCP/IP is used on top of X.25, the X.25 interface breaks
- the datagrams up into 128-byte packets. This is invisible to IP,
- because the packets are put back together into a single datagram at
- the other end before being processed by TCP/IP. So in this case, one
- IP datagram would be carried by several packets. However with most
- media, there are efficiency advantages to sending one datagram per
- packet, and so the distinction tends to vanish.
-
-
-
- 2.1 The TCP level
-
-
- Two separate protocols are involved in handling TCP/IP datagrams. TCP
- (the "transmission control protocol") is responsible for breaking up
- the message into datagrams, reassembling them at the other end,
- resending anything that gets lost, and putting things back in the
- right order. IP (the "internet protocol") is responsible for routing
- individual datagrams. It may seem like TCP is doing all the work.
- And in small networks that is true. However in the Internet, simply
- getting a datagram to its destination can be a complex job. A
- connection may require the datagram to go through several networks at
- Rutgers, a serial line to the John von Neuman Supercomputer Center, a
- couple of Ethernets there, a series of 56Kbaud phone lines to another
- NSFnet site, and more Ethernets on another campus. Keeping track of
- the routes to all of the destinations and handling incompatibilities
- among different transport media turns out to be a complex job. Note
- 7
-
-
-
- that the interface between TCP and IP is fairly simple. TCP simply
- hands IP a datagram with a destination. IP doesn't know how this
- datagram relates to any datagram before it or after it.
-
- It may have occurred to you that something is missing here. We have
- talked about Internet addresses, but not about how you keep track of
- multiple connections to a given system. Clearly it isn't enough to
- get a datagram to the right destination. TCP has to know which
- connection this datagram is part of. This task is referred to as
- "demultiplexing." In fact, there are several levels of demultiplexing
- going on in TCP/IP. The information needed to do this demultiplexing
- is contained in a series of "headers". A header is simply a few extra
- octets tacked onto the beginning of a datagram by some protocol in
- order to keep track of it. It's a lot like putting a letter into an
- envelope and putting an address on the outside of the envelope.
- Except with modern networks it happens several times. It's like you
- put the letter into a little envelope, your secretary puts that into a
- somewhat bigger envelope, the campus mail center puts that envelope
- into a still bigger one, etc. Here is an overview of the headers that
- get stuck on a message that passes through a typical TCP/IP network:
-
- We start with a single data stream, say a file you are trying to send
- to some other computer:
-
- ......................................................
-
- TCP breaks it up into manageable chunks. (In order to do this, TCP
- has to know how large a datagram your network can handle. Actually,
- the TCP's at each end say how big a datagram they can handle, and then
- they pick the smallest size.)
-
- .... .... .... .... .... .... .... ....
-
- TCP puts a header at the front of each datagram. This header actually
- contains at least 20 octets, but the most important ones are a source
- and destination "port number" and a "sequence number". The port
- numbers are used to keep track of different conversations. Suppose 3
- different people are transferring files. Your TCP might allocate port
- numbers 1000, 1001, and 1002 to these transfers. When you are sending
- a datagram, this becomes the "source" port number, since you are the
- source of the datagram. Of course the TCP at the other end has
- assigned a port number of its own for the conversation. Your TCP has
- to know the port number used by the other end as well. (It finds out
- when the connection starts, as we will explain below.) It puts this
- in the "destination" port field. Of course if the other end sends a
- datagram back to you, the source and destination port numbers will be
- reversed, since then it will be the source and you will be the
- destination. Each datagram has a sequence number. This is used so
- that the other end can make sure that it gets the datagrams in the
- right order, and that it hasn't missed any. (See the TCP
- specification for details.) TCP doesn't number the datagrams, but the
- octets. So if there are 500 octets of data in each datagram, the
- first datagram might be numbered 0, the second 500, the next 1000, the
- next 1500, etc. Finally, I will mention the Checksum. This is a
- number that is computed by adding up all the octets in the datagram
- 8
-
-
-
- (more or less - see the TCP spec). The result is put in the header.
- TCP at the other end computes the checksum again. If they disagree,
- then something bad happened to the datagram in transmission, and it is
- thrown away. So here's what the datagram looks like now.
-
- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- | Source Port | Destination Port |
- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- | Sequence Number |
- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- | Acknowledgment Number |
- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- | Data | |U|A|P|R|S|F| |
- | Offset| Reserved |R|C|S|S|Y|I| Window |
- | | |G|K|H|T|N|N| |
- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- | Checksum | Urgent Pointer |
- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- | your data ... next 500 octets |
- | ...... |
-
- If we abbreviate the TCP header as "T", the whole file now looks like
- this:
-
- T.... T.... T.... T.... T.... T.... T....
-
- You will note that there are items in the header that I have not
- described above. They are generally involved with managing the
- connection. In order to make sure the datagram has arrived at its
- destination, the recipient has to send back an "acknowledgement".
- This is a datagram whose "Acknowledgement number" field is filled in.
- For example, sending a packet with an acknowledgement of 1500
- indicates that you have received all the data up to octet number 1500.
- If the sender doesn't get an acknowledgement within a reasonable
- amount of time, it sends the data again. The window is used to
- control how much data can be in transit at any one time. It is not
- practical to wait for each datagram to be acknowledged before sending
- the next one. That would slow things down too much. On the other
- hand, you can't just keep sending, or a fast computer might overrun
- the capacity of a slow one to absorb data. Thus each end indicates
- how much new data it is currently prepared to absorb by putting the
- number of octets in its "Window" field. As the computer receives
- data, the amount of space left in its window decreases. When it goes
- to zero, the sender has to stop. As the receiver processes the data,
- it increases its window, indicating that it is ready to accept more
- data. Often the same datagram can be used to acknowledge receipt of a
- set of data and to give permission for additional new data (by an
- updated window). The "Urgent" field allows one end to tell the other
- to skip ahead in its processing to a particular octet. This is often
- useful for handling asynchronous events, for example when you type a
- control character or other command that interrupts output. The other
- fields are beyond the scope of this document.
-
-
-
- 9
-
-
-
- 2.2 The IP level
-
-
- TCP sends each of these datagrams to IP. Of course it has to tell IP
- the Internet address of the computer at the other end. Note that this
- is all IP is concerned about. It doesn't care about what is in the
- datagram, or even in the TCP header. IP's job is simply to find a
- route for the datagram and get it to the other end. In order to allow
- gateways or other intermediate systems to forward the datagram, it
- adds its own header. The main things in this header are the source
- and destination Internet address (32-bit addresses, like 128.6.4.194),
- the protocol number, and another checksum. The source Internet
- address is simply the address of your machine. (This is necessary so
- the other end knows where the datagram came from.) The destination
- Internet address is the address of the other machine. (This is
- necessary so any gateways in the middle know where you want the
- datagram to go.) The protocol number tells IP at the other end to
- send the datagram to TCP. Although most IP traffic uses TCP, there
- are other protocols that can use IP, so you have to tell IP which
- protocol to send the datagram to. Finally, the checksum allows IP at
- the other end to verify that the header wasn't damaged in transit.
- Note that TCP and IP have separate checksums. IP needs to be able to
- verify that the header didn't get damaged in transit, or it could send
- a message to the wrong place. For reasons not worth discussing here,
- it is both more efficient and safer to have TCP compute a separate
- checksum for the TCP header and data. Once IP has tacked on its
- header, here's what the message looks like:
-
- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- |Version| IHL |Type of Service| Total Length |
- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- | Identification |Flags| Fragment Offset |
- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- | Time to Live | Protocol | Header Checksum |
- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- | Source Address |
- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- | Destination Address |
- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- | TCP header, then your data ...... |
- | |
-
- If we represent the IP header by an "I", your file now looks like
- this:
-
- IT.... IT.... IT.... IT.... IT.... IT.... IT....
-
- Again, the header contains some additional fields that have not been
- discussed. Most of them are beyond the scope of this document. The
- flags and fragment offset are used to keep track of the pieces when a
- datagram has to be split up. This can happen when datagrams are
- forwarded through a network for which they are too big. (This will be
- discussed a bit more below.) The time to live is a number that is
- decremented whenever the datagram passes through a system. When it
- goes to zero, the datagram is discarded. This is done in case a loop
- 10
-
-
-
- develops in the system somehow. Of course this should be impossible,
- but well-designed networks are built to cope with "impossible"
- conditions.
-
- At this point, it's possible that no more headers are needed. If your
- computer happens to have a direct phone line connecting it to the
- destination computer, or to a gateway, it may simply send the
- datagrams out on the line (though likely a synchronous protocol such
- as HDLC would be used, and it would add at least a few octets at the
- beginning and end).
-
-
-
- 2.3 The Ethernet level
-
-
- However most of our networks these days use Ethernet. So now we have
- to describe Ethernet's headers. Unfortunately, Ethernet has its own
- addresses. The people who designed Ethernet wanted to make sure that
- no two machines would end up with the same Ethernet address.
- Furthermore, they didn't want the user to have to worry about
- assigning addresses. So each Ethernet controller comes with an
- address builtin from the factory. In order to make sure that they
- would never have to reuse addresses, the Ethernet designers allocated
- 48 bits for the Ethernet address. People who make Ethernet equipment
- have to register with a central authority, to make sure that the
- numbers they assign don't overlap any other manufacturer. Ethernet is
- a "broadcast medium". That is, it is in effect like an old party line
- telephone. When you send a packet out on the Ethernet, every machine
- on the network sees the packet. So something is needed to make sure
- that the right machine gets it. As you might guess, this involves the
- Ethernet header. Every Ethernet packet has a 14-octet header that
- includes the source and destination Ethernet address, and a type code.
- Each machine is supposed to pay attention only to packets with its own
- Ethernet address in the destination field. (It's perfectly possible
- to cheat, which is one reason that Ethernet communications are not
- terribly secure.) Note that there is no connection between the
- Ethernet address and the Internet address. Each machine has to have a
- table of what Ethernet address corresponds to what Internet address.
- (We will describe how this table is constructed a bit later.) In
- addition to the addresses, the header contains a type code. The type
- code is to allow for several different protocol families to be used on
- the same network. So you can use TCP/IP, DECnet, Xerox NS, etc. at
- the same time. Each of them will put a different value in the type
- field. Finally, there is a checksum. The Ethernet controller
- computes a checksum of the entire packet. When the other end receives
- the packet, it recomputes the checksum, and throws the packet away if
- the answer disagrees with the original. The checksum is put on the
- end of the packet, not in the header. The final result is that your
- message looks like this:
-
-
-
-
-
- 11
-
-
-
- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- | Ethernet destination address (first 32 bits) |
- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- | Ethernet dest (last 16 bits) |Ethernet source (first 16 bits)|
- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- | Ethernet source address (last 32 bits) |
- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- | Type code |
- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- | IP header, then TCP header, then your data |
- | |
- ...
- | |
- | end of your data |
- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- | Ethernet Checksum |
- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
-
- If we represent the Ethernet header with "E", and the Ethernet
- checksum with "C", your file now looks like this:
-
- EIT....C EIT....C EIT....C EIT....C EIT....C
-
- When these packets are received by the other end, of course all the
- headers are removed. The Ethernet interface removes the Ethernet
- header and the checksum. It looks at the type code. Since the type
- code is the one assigned to IP, the Ethernet device driver passes the
- datagram up to IP. IP removes the IP header. It looks at the IP
- protocol field. Since the protocol type is TCP, it passes the
- datagram up to TCP. TCP now looks at the sequence number. It uses
- the sequence numbers and other information to combine all the
- datagrams into the original file.
-
- The ends our initial summary of TCP/IP. There are still some crucial
- concepts we haven't gotten to, so we'll now go back and add details in
- several areas. (For detailed descriptions of the items discussed here
- see, RFC 793 for TCP, RFC 791 for IP, and RFC's 894 and 826 for
- sending IP over Ethernet.)
-
-
-
- 3. Well-known sockets and the applications layer
-
-
- So far, we have described how a stream of data is broken up into
- datagrams, sent to another computer, and put back together. However
- something more is needed in order to accomplish anything useful.
- There has to be a way for you to open a connection to a specified
- computer, log into it, tell it what file you want, and control the
- transmission of the file. (If you have a different application in
- mind, e.g. computer mail, some analogous protocol is needed.) This is
- done by "application protocols". The application protocols run "on
- top" of TCP/IP. That is, when they want to send a message, they give
- the message to TCP. TCP makes sure it gets delivered to the other
- end. Because TCP and IP take care of all the networking details, the
- 12
-
-
-
- applications protocols can treat a network connection as if it were a
- simple byte stream, like a terminal or phone line.
-
- Before going into more details about applications programs, we have to
- describe how you find an application. Suppose you want to send a file
- to a computer whose Internet address is 128.6.4.7. To start the
- process, you need more than just the Internet address. You have to
- connect to the FTP server at the other end. In general, network
- programs are specialized for a specific set of tasks. Most systems
- have separate programs to handle file transfers, remote terminal
- logins, mail, etc. When you connect to 128.6.4.7, you have to specify
- that you want to talk to the FTP server. This is done by having
- "well-known sockets" for each server. Recall that TCP uses port
- numbers to keep track of individual conversations. User programs
- normally use more or less random port numbers. However specific port
- numbers are assigned to the programs that sit waiting for requests.
- For example, if you want to send a file, you will start a program
- called "ftp". It will open a connection using some random number, say
- 1234, for the port number on its end. However it will specify port
- number 21 for the other end. This is the official port number for the
- FTP server. Note that there are two different programs involved. You
- run ftp on your side. This is a program designed to accept commands
- from your terminal and pass them on to the other end. The program
- that you talk to on the other machine is the FTP server. It is
- designed to accept commands from the network connection, rather than
- an interactive terminal. There is no need for your program to use a
- well-known socket number for itself. Nobody is trying to find it.
- However the servers have to have well-known numbers, so that people
- can open connections to them and start sending them commands. The
- official port numbers for each program are given in "Assigned
- Numbers".
-
- Note that a connection is actually described by a set of 4 numbers:
- the Internet address at each end, and the TCP port number at each end.
- Every datagram has all four of those numbers in it. (The Internet
- addresses are in the IP header, and the TCP port numbers are in the
- TCP header.) In order to keep things straight, no two connections can
- have the same set of numbers. However it is enough for any one number
- to be different. For example, it is perfectly possible for two
- different users on a machine to be sending files to the same other
- machine. This could result in connections with the following
- parameters:
-
- Internet addresses TCP ports
- connection 1 128.6.4.194, 128.6.4.7 1234, 21
- connection 2 128.6.4.194, 128.6.4.7 1235, 21
-
- Since the same machines are involved, the Internet addresses are the
- same. Since they are both doing file transfers, one end of the
- connection involves the well-known port number for FTP. The only
- thing that differs is the port number for the program that the users
- are running. That's enough of a difference. Generally, at least one
- end of the connection asks the network software to assign it a port
- number that is guaranteed to be unique. Normally, it's the user's
- end, since the server has to use a well-known number.
- 13
-
-
-
- Now that we know how to open connections, let's get back to the
- applications programs. As mentioned earlier, once TCP has opened a
- connection, we have something that might as well be a simple wire.
- All the hard parts are handled by TCP and IP. However we still need
- some agreement as to what we send over this connection. In effect
- this is simply an agreement on what set of commands the application
- will understand, and the format in which they are to be sent.
- Generally, what is sent is a combination of commands and data. They
- use context to differentiate. For example, the mail protocol works
- like this: Your mail program opens a connection to the mail server at
- the other end. Your program gives it your machine's name, the sender
- of the message, and the recipients you want it sent to. It then sends
- a command saying that it is starting the message. At that point, the
- other end stops treating what it sees as commands, and starts
- accepting the message. Your end then starts sending the text of the
- message. At the end of the message, a special mark is sent (a dot in
- the first column). After that, both ends understand that your program
- is again sending commands. This is the simplest way to do things, and
- the one that most applications use.
-
- File transfer is somewhat more complex. The file transfer protocol
- involves two different connections. It starts out just like mail.
- The user's program sends commands like "log me in as this user", "here
- is my password", "send me the file with this name". However once the
- command to send data is sent, a second connection is opened for the
- data itself. It would certainly be possible to send the data on the
- same connection, as mail does. However file transfers often take a
- long time. The designers of the file transfer protocol wanted to
- allow the user to continue issuing commands while the transfer is
- going on. For example, the user might make an inquiry, or he might
- abort the transfer. Thus the designers felt it was best to use a
- separate connection for the data and leave the original command
- connection for commands. (It is also possible to open command
- connections to two different computers, and tell them to send a file
- from one to the other. In that case, the data couldn't go over the
- command connection.)
-
- Remote terminal connections use another mechanism still. For remote
- logins, there is just one connection. It normally sends data. When
- it is necessary to send a command (e.g. to set the terminal type or to
- change some mode), a special character is used to indicate that the
- next character is a command. If the user happens to type that special
- character as data, two of them are sent.
-
- We are not going to describe the application protocols in detail in
- this document. It's better to read the RFC's yourself. However there
- are a couple of common conventions used by applications that will be
- described here. First, the common network representation: TCP/IP is
- intended to be usable on any computer. Unfortunately, not all
- computers agree on how data is represented. There are differences in
- character codes (ASCII vs. EBCDIC), in end of line conventions
- (carriage return, line feed, or a representation using counts), and in
- whether terminals expect characters to be sent individually or a line
- at a time. In order to allow computers of different kinds to
- communicate, each applications protocol defines a standard
- 14
-
-
-
- representation. Note that TCP and IP do not care about the
- representation. TCP simply sends octets. However the programs at
- both ends have to agree on how the octets are to be interpreted. The
- RFC for each application specifies the standard representation for
- that application. Normally it is "net ASCII". This uses ASCII
- characters, with end of line denoted by a carriage return followed by
- a line feed. For remote login, there is also a definition of a
- "standard terminal", which turns out to be a half-duplex terminal with
- echoing happening on the local machine. Most applications also make
- provisions for the two computers to agree on other representations
- that they may find more convenient. For example, PDP-10's have 36-bit
- words. There is a way that two PDP-10's can agree to send a 36-bit
- binary file. Similarly, two systems that prefer full-duplex terminal
- conversations can agree on that. However each application has a
- standard representation, which every machine must support.
-
-
-
- 3.1 An example application: SMTP
-
-
- In order to give a bit better idea what is involved in the application
- protocols, I'm going to show an example of SMTP, which is the mail
- protocol. (SMTP is "simple mail transfer protocol.) We assume that a
- computer called TOPAZ.RUTGERS.EDU wants to send the following message.
-
- Date: Sat, 27 Jun 87 13:26:31 EDT
- From: hedrick@topaz.rutgers.edu
- To: levy@red.rutgers.edu
- Subject: meeting
-
- Let's get together Monday at 1pm.
-
- First, note that the format of the message itself is described by an
- Internet standard (RFC 822). The standard specifies the fact that the
- message must be transmitted as net ASCII (i.e. it must be ASCII, with
- carriage return/linefeed to delimit lines). It also describes the
- general structure, as a group of header lines, then a blank line, and
- then the body of the message. Finally, it describes the syntax of the
- header lines in detail. Generally they consist of a keyword and then
- a value.
-
- Note that the addressee is indicated as LEVY@RED.RUTGERS.EDU.
- Initially, addresses were simply "person at machine". However recent
- standards have made things more flexible. There are now provisions
- for systems to handle other systems' mail. This can allow automatic
- forwarding on behalf of computers not connected to the Internet. It
- can be used to direct mail for a number of systems to one central mail
- server. Indeed there is no requirement that an actual computer by the
- name of RED.RUTGERS.EDU even exist. The name servers could be set up
- so that you mail to department names, and each department's mail is
- routed automatically to an appropriate computer. It is also possible
- that the part before the @ is something other than a user name. It is
- possible for programs to be set up to process mail. There are also
- provisions to handle mailing lists, and generic names such as
- 15
-
-
-
- "postmaster" or "operator".
-
- The way the message is to be sent to another system is described by
- RFC's 821 and 974. The program that is going to be doing the sending
- asks the name server several queries to determine where to route the
- message. The first query is to find out which machines handle mail
- for the name RED.RUTGERS.EDU. In this case, the server replies that
- RED.RUTGERS.EDU handles its own mail. The program then asks for the
- address of RED.RUTGERS.EDU, which is 128.6.4.2. Then the mail program
- opens a TCP connection to port 25 on 128.6.4.2. Port 25 is the
- well-known socket used for receiving mail. Once this connection is
- established, the mail program starts sending commands. Here is a
- typical conversation. Each line is labelled as to whether it is from
- TOPAZ or RED. Note that TOPAZ initiated the connection:
-
- RED 220 RED.RUTGERS.EDU SMTP Service at 29 Jun 87 05:17:18 EDT
- TOPAZ HELO topaz.rutgers.edu
- RED 250 RED.RUTGERS.EDU - Hello, TOPAZ.RUTGERS.EDU
- TOPAZ MAIL From:<hedrick@topaz.rutgers.edu>
- RED 250 MAIL accepted
- TOPAZ RCPT To:<levy@red.rutgers.edu>
- RED 250 Recipient accepted
- TOPAZ DATA
- RED 354 Start mail input; end with <CRLF>.<CRLF>
- TOPAZ Date: Sat, 27 Jun 87 13:26:31 EDT
- TOPAZ From: hedrick@topaz.rutgers.edu
- TOPAZ To: levy@red.rutgers.edu
- TOPAZ Subject: meeting
- TOPAZ
- TOPAZ Let's get together Monday at 1pm.
- TOPAZ .
- RED 250 OK
- TOPAZ QUIT
- RED 221 RED.RUTGERS.EDU Service closing transmission channel
-
- First, note that commands all use normal text. This is typical of the
- Internet standards. Many of the protocols use standard ASCII
- commands. This makes it easy to watch what is going on and to
- diagnose problems. For example, the mail program keeps a log of each
- conversation. If something goes wrong, the log file can simply be
- mailed to the postmaster. Since it is normal text, he can see what
- was going on. It also allows a human to interact directly with the
- mail server, for testing. (Some newer protocols are complex enough
- that this is not practical. The commands would have to have a syntax
- that would require a significant parser. Thus there is a tendency for
- newer protocols to use binary formats. Generally they are structured
- like C or Pascal record structures.) Second, note that the responses
- all begin with numbers. This is also typical of Internet protocols.
- The allowable responses are defined in the protocol. The numbers
- allow the user program to respond unambiguously. The rest of the
- response is text, which is normally for use by any human who may be
- watching or looking at a log. It has no effect on the operation of
- the programs. (However there is one point at which the protocol uses
- part of the text of the response.) The commands themselves simply
- allow the mail program on one end to tell the mail server the
- 16
-
-
-
- information it needs to know in order to deliver the message. In this
- case, the mail server could get the information by looking at the
- message itself. But for more complex cases, that would not be safe.
- Every session must begin with a HELO, which gives the name of the
- system that initiated the connection. Then the sender and recipients
- are specified. (There can be more than one RCPT command, if there are
- several recipients.) Finally the data itself is sent. Note that the
- text of the message is terminated by a line containing just a period.
- (If such a line appears in the message, the period is doubled.) After
- the message is accepted, the sender can send another message, or
- terminate the session as in the example above.
-
- Generally, there is a pattern to the response numbers. The protocol
- defines the specific set of responses that can be sent as answers to
- any given command. However programs that don't want to analyze them
- in detail can just look at the first digit. In general, responses
- that begin with a 2 indicate success. Those that begin with 3
- indicate that some further action is needed, as shown above. 4 and 5
- indicate errors. 4 is a "temporary" error, such as a disk filling.
- The message should be saved, and tried again later. 5 is a permanent
- error, such as a non-existent recipient. The message should be
- returned to the sender with an error message.
-
- (For more details about the protocols mentioned in this section, see
- RFC's 821/822 for mail, RFC 959 for file transfer, and RFC's 854/855
- for remote logins. For the well-known port numbers, see the current
- edition of Assigned Numbers, and possibly RFC 814.)
-
-
-
- 4. Protocols other than TCP: UDP and ICMP
-
-
- So far, we have described only connections that use TCP. Recall that
- TCP is responsible for breaking up messages into datagrams, and
- reassembling them properly. However in many applications, we have
- messages that will always fit in a single datagram. An example is
- name lookup. When a user attempts to make a connection to another
- system, he will generally specify the system by name, rather than
- Internet address. His system has to translate that name to an address
- before it can do anything. Generally, only a few systems have the
- database used to translate names to addresses. So the user's system
- will want to send a query to one of the systems that has the database.
- This query is going to be very short. It will certainly fit in one
- datagram. So will the answer. Thus it seems silly to use TCP. Of
- course TCP does more than just break things up into datagrams. It
- also makes sure that the data arrives, resending datagrams where
- necessary. But for a question that fits in a single datagram, we
- don't need all the complexity of TCP to do this. If we don't get an
- answer after a few seconds, we can just ask again. For applications
- like this, there are alternatives to TCP.
-
- The most common alternative is UDP ("user datagram protocol"). UDP is
- designed for applications where you don't need to put sequences of
- datagrams together. It fits into the system much like TCP. There is
- 17
-
-
-
- a UDP header. The network software puts the UDP header on the front
- of your data, just as it would put a TCP header on the front of your
- data. Then UDP sends the data to IP, which adds the IP header,
- putting UDP's protocol number in the protocol field instead of TCP's
- protocol number. However UDP doesn't do as much as TCP does. It
- doesn't split data into multiple datagrams. It doesn't keep track of
- what it has sent so it can resend if necessary. About all that UDP
- provides is port numbers, so that several programs can use UDP at
- once. UDP port numbers are used just like TCP port numbers. There
- are well-known port numbers for servers that use UDP. Note that the
- UDP header is shorter than a TCP header. It still has source and
- destination port numbers, and a checksum, but that's about it. No
- sequence number, since it is not needed. UDP is used by the protocols
- that handle name lookups (see IEN 116, RFC 882, and RFC 883), and a
- number of similar protocols.
-
- Another alternative protocol is ICMP ("Internet control message
- protocol"). ICMP is used for error messages, and other messages
- intended for the TCP/IP software itself, rather than any particular
- user program. For example, if you attempt to connect to a host, your
- system may get back an ICMP message saying "host unreachable". ICMP
- can also be used to find out some information about the network. See
- RFC 792 for details of ICMP. ICMP is similar to UDP, in that it
- handles messages that fit in one datagram. However it is even simpler
- than UDP. It doesn't even have port numbers in its header. Since all
- ICMP messages are interpreted by the network software itself, no port
- numbers are needed to say where a ICMP message is supposed to go.
-
-
-
- 5. Keeping track of names and information: the domain system
-
-
- As we indicated earlier, the network software generally needs a 32-bit
- Internet address in order to open a connection or send a datagram.
- However users prefer to deal with computer names rather than numbers.
- Thus there is a database that allows the software to look up a name
- and find the corresponding number. When the Internet was small, this
- was easy. Each system would have a file that listed all of the other
- systems, giving both their name and number. There are now too many
- computers for this approach to be practical. Thus these files have
- been replaced by a set of name servers that keep track of host names
- and the corresponding Internet addresses. (In fact these servers are
- somewhat more general than that. This is just one kind of information
- stored in the domain system.) Note that a set of interlocking servers
- are used, rather than a single central one. There are now so many
- different institutions connected to the Internet that it would be
- impractical for them to notify a central authority whenever they
- installed or moved a computer. Thus naming authority is delegated to
- individual institutions. The name servers form a tree, corresponding
- to institutional structure. The names themselves follow a similar
- structure. A typical example is the name BORAX.LCS.MIT.EDU. This is
- a computer at the Laboratory for Computer Science (LCS) at MIT. In
- order to find its Internet address, you might potentially have to
- consult 4 different servers. First, you would ask a central server
- 18
-
-
-
- (called the root) where the EDU server is. EDU is a server that keeps
- track of educational institutions. The root server would give you the
- names and Internet addresses of several servers for EDU. (There are
- several servers at each level, to allow for the possibly that one
- might be down.) You would then ask EDU where the server for MIT is.
- Again, it would give you names and Internet addresses of several
- servers for MIT. Generally, not all of those servers would be at MIT,
- to allow for the possibility of a general power failure at MIT. Then
- you would ask MIT where the server for LCS is, and finally you would
- ask one of the LCS servers about BORAX. The final result would be the
- Internet address for BORAX.LCS.MIT.EDU. Each of these levels is
- referred to as a "domain". The entire name, BORAX.LCS.MIT.EDU, is
- called a "domain name". (So are the names of the higher-level
- domains, such as LCS.MIT.EDU, MIT.EDU, and EDU.)
-
- Fortunately, you don't really have to go through all of this most of
- the time. First of all, the root name servers also happen to be the
- name servers for the top-level domains such as EDU. Thus a single
- query to a root server will get you to MIT. Second, software
- generally remembers answers that it got before. So once we look up a
- name at LCS.MIT.EDU, our software remembers where to find servers for
- LCS.MIT.EDU, MIT.EDU, and EDU. It also remembers the translation of
- BORAX.LCS.MIT.EDU. Each of these pieces of information has a "time to
- live" associated with it. Typically this is a few days. After that,
- the information expires and has to be looked up again. This allows
- institutions to change things.
-
- The domain system is not limited to finding out Internet addresses.
- Each domain name is a node in a database. The node can have records
- that define a number of different properties. Examples are Internet
- address, computer type, and a list of services provided by a computer.
- A program can ask for a specific piece of information, or all
- information about a given name. It is possible for a node in the
- database to be marked as an "alias" (or nickname) for another node.
- It is also possible to use the domain system to store information
- about users, mailing lists, or other objects.
-
- There is an Internet standard defining the operation of these
- databases, as well as the protocols used to make queries of them.
- Every network utility has to be able to make such queries, since this
- is now the official way to evaluate host names. Generally utilities
- will talk to a server on their own system. This server will take care
- of contacting the other servers for them. This keeps down the amount
- of code that has to be in each application program.
-
- The domain system is particularly important for handling computer
- mail. There are entry types to define what computer handles mail for
- a given name, to specify where an individual is to receive mail, and
- to define mailing lists.
-
- (See RFC's 882, 883, and 973 for specifications of the domain system.
- RFC 974 defines the use of the domain system in sending mail.)
-
-
-
- 19
-
-
-
- 6. Routing
-
-
- The description above indicated that the IP implementation is
- responsible for getting datagrams to the destination indicated by the
- destination address, but little was said about how this would be done.
- The task of finding how to get a datagram to its destination is
- referred to as "routing". In fact many of the details depend upon the
- particular implementation. However some general things can be said.
-
- First, it is necessary to understand the model on which IP is based.
- IP assumes that a system is attached to some local network. We assume
- that the system can send datagrams to any other system on its own
- network. (In the case of Ethernet, it simply finds the Ethernet
- address of the destination system, and puts the datagram out on the
- Ethernet.) The problem comes when a system is asked to send a
- datagram to a system on a different network. This problem is handled
- by gateways. A gateway is a system that connects a network with one
- or more other networks. Gateways are often normal computers that
- happen to have more than one network interface. For example, we have
- a Unix machine that has two different Ethernet interfaces. Thus it is
- connected to networks 128.6.4 and 128.6.3. This machine can act as a
- gateway between those two networks. The software on that machine must
- be set up so that it will forward datagrams from one network to the
- other. That is, if a machine on network 128.6.4 sends a datagram to
- the gateway, and the datagram is addressed to a machine on network
- 128.6.3, the gateway will forward the datagram to the destination.
- Major communications centers often have gateways that connect a number
- of different networks. (In many cases, special-purpose gateway
- systems provide better performance or reliability than general-purpose
- systems acting as gateways. A number of vendors sell such systems.)
-
- Routing in IP is based entirely upon the network number of the
- destination address. Each computer has a table of network numbers.
- For each network number, a gateway is listed. This is the gateway to
- be used to get to that network. Note that the gateway doesn't have to
- connect directly to the network. It just has to be the best place to
- go to get there. For example at Rutgers, our interface to NSFnet is
- at the John von Neuman Supercomputer Center (JvNC). Our connection to
- JvNC is via a high-speed serial line connected to a gateway whose
- address is 128.6.3.12. Systems on net 128.6.3 will list 128.6.3.12 as
- the gateway for many off-campus networks. However systems on net
- 128.6.4 will list 128.6.4.1 as the gateway to those same off-campus
- networks. 128.6.4.1 is the gateway between networks 128.6.4 and
- 128.6.3, so it is the first step in getting to JvNC.
-
- When a computer wants to send a datagram, it first checks to see if
- the destination address is on the system's own local network. If so,
- the datagram can be sent directly. Otherwise, the system expects to
- find an entry for the network that the destination address is on. The
- datagram is sent to the gateway listed in that entry. This table can
- get quite big. For example, the Internet now includes several hundred
- individual networks. Thus various strategies have been developed to
- reduce the size of the routing table. One strategy is to depend upon
- "default routes". Often, there is only one gateway out of a network.
- 20
-
-
-
- This gateway might connect a local Ethernet to a campus-wide backbone
- network. In that case, we don't need to have a separate entry for
- every network in the world. We simply define that gateway as a
- "default". When no specific route is found for a datagram, the
- datagram is sent to the default gateway. A default gateway can even
- be used when there are several gateways on a network. There are
- provisions for gateways to send a message saying "I'm not the best
- gateway -- use this one instead." (The message is sent via ICMP. See
- RFC 792.) Most network software is designed to use these messages to
- add entries to their routing tables. Suppose network 128.6.4 has two
- gateways, 128.6.4.59 and 128.6.4.1. 128.6.4.59 leads to several other
- internal Rutgers networks. 128.6.4.1 leads indirectly to the NSFnet.
- Suppose we set 128.6.4.59 as a default gateway, and have no other
- routing table entries. Now what happens when we need to send a
- datagram to MIT? MIT is network 18. Since we have no entry for
- network 18, the datagram will be sent to the default, 128.6.4.59. As
- it happens, this gateway is the wrong one. So it will forward the
- datagram to 128.6.4.1. But it will also send back an error saying in
- effect: "to get to network 18, use 128.6.4.1". Our software will then
- add an entry to the routing table. Any future datagrams to MIT will
- then go directly to 128.6.4.1. (The error message is sent using the
- ICMP protocol. The message type is called "ICMP redirect.")
-
- Most IP experts recommend that individual computers should not try to
- keep track of the entire network. Instead, they should start with
- default gateways, and let the gateways tell them the routes, as just
- described. However this doesn't say how the gateways should find out
- about the routes. The gateways can't depend upon this strategy. They
- have to have fairly complete routing tables. For this, some sort of
- routing protocol is needed. A routing protocol is simply a technique
- for the gateways to find each other, and keep up to date about the
- best way to get to every network. RFC 1009 contains a review of
- gateway design and routing. However rip.doc is probably a better
- introduction to the subject. It contains some tutorial material, and
- a detailed description of the most commonly-used routing protocol.
-
-
-
- 7. Details about Internet addresses: subnets and broadcasting
-
-
- As indicated earlier, Internet addresses are 32-bit numbers, normally
- written as 4 octets (in decimal), e.g. 128.6.4.7. There are actually
- 3 different types of address. The problem is that the address has to
- indicate both the network and the host within the network. It was
- felt that eventually there would be lots of networks. Many of them
- would be small, but probably 24 bits would be needed to represent all
- the IP networks. It was also felt that some very big networks might
- need 24 bits to represent all of their hosts. This would seem to lead
- to 48 bit addresses. But the designers really wanted to use 32 bit
- addresses. So they adopted a kludge. The assumption is that most of
- the networks will be small. So they set up three different ranges of
- address. Addresses beginning with 1 to 126 use only the first octet
- for the network number. The other three octets are available for the
- host number. Thus 24 bits are available for hosts. These numbers are
- 21
-
-
-
- used for large networks. But there can only be 126 of these very big
- networks. The Arpanet is one, and there are a few large commercial
- networks. But few normal organizations get one of these "class A"
- addresses. For normal large organizations, "class B" addresses are
- used. Class B addresses use the first two octets for the network
- number. Thus network numbers are 128.1 through 191.254. (We avoid 0
- and 255, for reasons that we see below. We also avoid addresses
- beginning with 127, because that is used by some systems for special
- purposes.) The last two octets are available for host addesses,
- giving 16 bits of host address. This allows for 64516 computers,
- which should be enough for most organizations. (It is possible to get
- more than one class B address, if you run out.) Finally, class C
- addresses use three octets, in the range 192.1.1 to 223.254.254.
- These allow only 254 hosts on each network, but there can be lots of
- these networks. Addresses above 223 are reserved for future use, as
- class D and E (which are currently not defined).
-
- Many large organizations find it convenient to divide their network
- number into "subnets". For example, Rutgers has been assigned a class
- B address, 128.6. We find it convenient to use the third octet of the
- address to indicate which Ethernet a host is on. This division has no
- significance outside of Rutgers. A computer at another institution
- would treat all datagrams addressed to 128.6 the same way. They would
- not look at the third octet of the address. Thus computers outside
- Rutgers would not have different routes for 128.6.4 or 128.6.5. But
- inside Rutgers, we treat 128.6.4 and 128.6.5 as separate networks. In
- effect, gateways inside Rutgers have separate entries for each Rutgers
- subnet, whereas gateways outside Rutgers just have one entry for
- 128.6. Note that we could do exactly the same thing by using a
- separate class C address for each Ethernet. As far as Rutgers is
- concerned, it would be just as convenient for us to have a number of
- class C addresses. However using class C addresses would make things
- inconvenient for the rest of the world. Every institution that wanted
- to talk to us would have to have a separate entry for each one of our
- networks. If every institution did this, there would be far too many
- networks for any reasonable gateway to keep track of. By subdividing
- a class B network, we hide our internal structure from everyone else,
- and save them trouble. This subnet strategy requires special
- provisions in the network software. It is described in RFC 950.
-
- 0 and 255 have special meanings. 0 is reserved for machines that
- don't know their address. In certain circumstances it is possible for
- a machine not to know the number of the network it is on, or even its
- own host address. For example, 0.0.0.23 would be a machine that knew
- it was host number 23, but didn't know on what network.
-
- 255 is used for "broadcast". A broadcast is a message that you want
- every system on the network to see. Broadcasts are used in some
- situations where you don't know who to talk to. For example, suppose
- you need to look up a host name and get its Internet address.
- Sometimes you don't know the address of the nearest name server. In
- that case, you might send the request as a broadcast. There are also
- cases where a number of systems are interested in information. It is
- then less expensive to send a single broadcast than to send datagrams
- individually to each host that is interested in the information. In
- 22
-
-
-
- order to send a broadcast, you use an address that is made by using
- your network address, with all ones in the part of the address where
- the host number goes. For example, if you are on network 128.6.4, you
- would use 128.6.4.255 for broadcasts. How this is actually
- implemented depends upon the medium. It is not possible to send
- broadcasts on the Arpanet, or on point to point lines. However it is
- possible on an Ethernet. If you use an Ethernet address with all its
- bits on (all ones), every machine on the Ethernet is supposed to look
- at that datagram.
-
- Although the official broadcast address for network 128.6.4 is now
- 128.6.4.255, there are some other addresses that may be treated as
- broadcasts by certain implementations. For convenience, the standard
- also allows 255.255.255.255 to be used. This refers to all hosts on
- the local network. It is often simpler to use 255.255.255.255 instead
- of finding out the network number for the local network and forming a
- broadcast address such as 128.6.4.255. In addition, certain older
- implementations may use 0 instead of 255 to form the broadcast
- address. Such implementations would use 128.6.4.0 instead of
- 128.6.4.255 as the broadcast address on network 128.6.4. Finally,
- certain older implementations may not understand about subnets. Thus
- they consider the network number to be 128.6. In that case, they will
- assume a broadcast address of 128.6.255.255 or 128.6.0.0. Until
- support for broadcasts is implemented properly, it can be a somewhat
- dangerous feature to use.
-
- Because 0 and 255 are used for unknown and broadcast addresses, normal
- hosts should never be given addresses containing 0 or 255. Addresses
- should never begin with 0, 127, or any number above 223. Addresses
- violating these rules are sometimes referred to as "Martians", because
- of rumors that the Central University of Mars is using network 225.
-
-
-
- 8. Datagram fragmentation and reassembly
-
-
- TCP/IP is designed for use with many different kinds of network.
- Unfortunately, network designers do not agree about how big packets
- can be. Ethernet packets can be 1500 octets long. Arpanet packets
- have a maximum of around 1000 octets. Some very fast networks have
- much larger packet sizes. At first, you might think that IP should
- simply settle on the smallest possible size. Unfortunately, this
- would cause serious performance problems. When transferring large
- files, big packets are far more efficient than small ones. So we want
- to be able to use the largest packet size possible. But we also want
- to be able to handle networks with small limits. There are two
- provisions for this. First, TCP has the ability to "negotiate" about
- datagram size. When a TCP connection first opens, both ends can send
- the maximum datagram size they can handle. The smaller of these
- numbers is used for the rest of the connection. This allows two
- implementations that can handle big datagrams to use them, but also
- lets them talk to implementations that can't handle them. However
- this doesn't completely solve the problem. The most serious problem
- is that the two ends don't necessarily know about all of the steps in
- 23
-
-
-
- between. For example, when sending data between Rutgers and Berkeley,
- it is likely that both computers will be on Ethernets. Thus they will
- both be prepared to handle 1500-octet datagrams. However the
- connection will at some point end up going over the Arpanet. It can't
- handle packets of that size. For this reason, there are provisions to
- split datagrams up into pieces. (This is referred to as
- "fragmentation".) The IP header contains fields indicating the a
- datagram has been split, and enough information to let the pieces be
- put back together. If a gateway connects an Ethernet to the Arpanet,
- it must be prepared to take 1500-octet Ethernet packets and split them
- into pieces that will fit on the Arpanet. Furthermore, every host
- implementation of TCP/IP must be prepared to accept pieces and put
- them back together. This is referred to as "reassembly".
-
- TCP/IP implementations differ in the approach they take to deciding on
- datagram size. It is fairly common for implementations to use
- 576-byte datagrams whenever they can't verify that the entire path is
- able to handle larger packets. This rather conservative strategy is
- used because of the number of implementations with bugs in the code to
- reassemble fragments. Implementors often try to avoid ever having
- fragmentation occur. Different implementors take different approaches
- to deciding when it is safe to use large datagrams. Some use them
- only for the local network. Others will use them for any network on
- the same campus. 576 bytes is a "safe" size, which every
- implementation must support.
-
-
-
- 9. Ethernet encapsulation: ARP
-
-
- There was a brief discussion earlier about what IP datagrams look like
- on an Ethernet. The discussion showed the Ethernet header and
- checksum. However it left one hole: It didn't say how to figure out
- what Ethernet address to use when you want to talk to a given Internet
- address. In fact, there is a separate protocol for this, called ARP
- ("address resolution protocol"). (Note by the way that ARP is not an
- IP protocol. That is, the ARP datagrams do not have IP headers.)
- Suppose you are on system 128.6.4.194 and you want to connect to
- system 128.6.4.7. Your system will first verify that 128.6.4.7 is on
- the same network, so it can talk directly via Ethernet. Then it will
- look up 128.6.4.7 in its ARP table, to see if it already knows the
- Ethernet address. If so, it will stick on an Ethernet header, and
- send the packet. But suppose this system is not in the ARP table.
- There is no way to send the packet, because you need the Ethernet
- address. So it uses the ARP protocol to send an ARP request.
- Essentially an ARP request says "I need the Ethernet address for
- 128.6.4.7". Every system listens to ARP requests. When a system sees
- an ARP request for itself, it is required to respond. So 128.6.4.7
- will see the request, and will respond with an ARP reply saying in
- effect "128.6.4.7 is 8:0:20:1:56:34". (Recall that Ethernet addresses
- are 48 bits. This is 6 octets. Ethernet addresses are conventionally
- shown in hex, using the punctuation shown.) Your system will save
- this information in its ARP table, so future packets will go directly.
- Most systems treat the ARP table as a cache, and clear entries in it
- 24
-
-
-
- if they have not been used in a certain period of time.
-
- Note by the way that ARP requests must be sent as "broadcasts". There
- is no way that an ARP request can be sent directly to the right
- system. After all, the whole reason for sending an ARP request is
- that you don't know the Ethernet address. So an Ethernet address of
- all ones is used, i.e. ff:ff:ff:ff:ff:ff. By convention, every
- machine on the Ethernet is required to pay attention to packets with
- this as an address. So every system sees every ARP requests. They
- all look to see whether the request is for their own address. If so,
- they respond. If not, they could just ignore it. (Some hosts will
- use ARP requests to update their knowledge about other hosts on the
- network, even if the request isn't for them.) Note that packets whose
- IP address indicates broadcast (e.g. 255.255.255.255 or 128.6.4.255)
- are also sent with an Ethernet address that is all ones.
-
-
-
- 10. Getting more information
-
-
- This directory contains documents describing the major protocols.
- There are literally hundreds of documents, so we have chosen the ones
- that seem most important. Internet standards are called RFC's. RFC
- stands for Request for Comment. A proposed standard is initially
- issued as a proposal, and given an RFC number. When it is finally
- accepted, it is added to Official Internet Protocols, but it is still
- referred to by the RFC number. We have also included two IEN's.
- (IEN's used to be a separate classification for more informal
- documents. This classification no longer exists -- RFC's are now used
- for all official Internet documents, and a mailing list is used for
- more informal reports.) The convention is that whenever an RFC is
- revised, the revised version gets a new number. This is fine for most
- purposes, but it causes problems with two documents: Assigned Numbers
- and Official Internet Protocols. These documents are being revised
- all the time, so the RFC number keeps changing. You will have to look
- in rfc-index.txt to find the number of the latest edition. Anyone who
- is seriously interested in TCP/IP should read the RFC describing IP
- (791). RFC 1009 is also useful. It is a specification for gateways
- to be used by NSFnet. As such, it contains an overview of a lot of
- the TCP/IP technology. You should probably also read the description
- of at least one of the application protocols, just to get a feel for
- the way things work. Mail is probably a good one (821/822). TCP
- (793) is of course a very basic specification. However the spec is
- fairly complex, so you should only read this when you have the time
- and patience to think about it carefully. Fortunately, the author of
- the major RFC's (Jon Postel) is a very good writer. The TCP RFC is
- far easier to read than you would expect, given the complexity of what
- it is describing. You can look at the other RFC's as you become
- curious about their subject matter.
-
- Here is a list of the documents you are more likely to want:
-
- rfc-index list of all RFC's
-
- 25
-
-
-
- rfc1012 somewhat fuller list of all RFC's
-
- rfc1011 Official Protocols. It's useful to scan this to see
- what tasks protocols have been built for. This defines
- which RFC's are actual standards, as opposed to
- requests for comments.
-
- rfc1010 Assigned Numbers. If you are working with TCP/IP, you
- will probably want a hardcopy of this as a reference.
- It's not very exciting to read. It lists all the
- offically defined well-known ports and lots of other
- things.
-
- rfc1009 NSFnet gateway specifications. A good overview of IP
- routing and gateway technology.
-
- rfc1001/2 netBIOS: networking for PC's
-
- rfc973 update on domains
-
- rfc959 FTP (file transfer)
-
- rfc950 subnets
-
- rfc937 POP2: protocol for reading mail on PC's
-
- rfc894 how IP is to be put on Ethernet, see also rfc825
-
- rfc882/3 domains (the database used to go from host names to
- Internet address and back -- also used to handle UUCP
- these days). See also rfc973
-
- rfc854/5 telnet - protocol for remote logins
-
- rfc826 ARP - protocol for finding out Ethernet addresses
-
- rfc821/2 mail
-
- rfc814 names and ports - general concepts behind well-known
- ports
-
- rfc793 TCP
-
- rfc792 ICMP
-
- rfc791 IP
-
- rfc768 UDP
-
- rip.doc details of the most commonly-used routing protocol
-
- ien-116 old name server (still needed by several kinds of
- system)
-
- ien-48 the Catenet model, general description of the
- 26
-
-
-
- philosophy behind TCP/IP
-
- The following documents are somewhat more specialized.
-
- rfc813 window and acknowledgement strategies in TCP
-
- rfc815 datagram reassembly techniques
-
- rfc816 fault isolation and resolution techniques
-
- rfc817 modularity and efficiency in implementation
-
- rfc879 the maximum segment size option in TCP
-
- rfc896 congestion control
-
- rfc827,888,904,975,985
- EGP and related issues
-
- To those of you who may be reading this document remotely instead of
- at Rutgers: The most important RFC's have been collected into a
- three-volume set, the DDN Protocol Handbook. It is available from the
- DDN Network Information Center, SRI International, 333 Ravenswood
- Avenue, Menlo Park, California 94025 (telephone: 800-235-3155). You
- should be able to get them via anonymous FTP from sri-nic.arpa. File
- names are:
-
- RFC's:
- rfc:rfc-index.txt
- rfc:rfcxxx.txt
- IEN's:
- ien:ien-index.txt
- ien:ien-xxx.txt
-
- rip.doc is available by anonymous FTP from topaz.rutgers.edu, as
- /pub/tcp-ip-docs/rip.doc.
-
- Sites with access to UUCP but not FTP may be able to retreive them via
- UUCP from UUCP host rutgers. The file names would be
-
- RFC's:
- /topaz/pub/pub/tcp-ip-docs/rfc-index.txt
- /topaz/pub/pub/tcp-ip-docs/rfcxxx.txt
- IEN's:
- /topaz/pub/pub/tcp-ip-docs/ien-index.txt
- /topaz/pub/pub/tcp-ip-docs/ien-xxx.txt
- /topaz/pub/pub/tcp-ip-docs/rip.doc
-
- Note that SRI-NIC has the entire set of RFC's and IEN's, but rutgers
- and topaz have only those specifically mentioned above.
-
-
-
-
-
- 27
-